首页> 外文OA文献 >Multi-objective Contextual Multi-armed Bandit Problem with a Dominant Objective

【2h】

Multi-objective Contextual Multi-armed Bandit Problem with a Dominant Objective

机译：具有显性特征的多目标语境多臂强盗问题目的

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new multi-objective contextual multi-armed banditproblem with two objectives, where one of the objectives dominates the otherobjective. Unlike single-objective bandit problems in which the learner obtainsa random scalar reward for each arm it selects, in the proposed problem, thelearner obtains a random reward vector, where each component of the rewardvector corresponds to one of the objectives and the distribution of the rewarddepends on the context that is provided to the learner at the beginning of eachround. We call this problem contextual multi-armed bandit with a dominantobjective (CMAB-DO). In CMAB-DO, the goal of the learner is to maximize itstotal reward in the non-dominant objective while ensuring that it maximizes itstotal reward in the dominant objective. In this case, the optimal arm given acontext is the one that maximizes the expected reward in the non-dominantobjective among all arms that maximize the expected reward in the dominantobjective. First, we show that the optimal arm lies in the Pareto front. Then,we propose the multi-objective contextual multi-armed bandit algorithm(MOC-MAB), and define two performance measures: the 2-dimensional (2D) regretand the Pareto regret. We show that both the 2D regret and the Pareto regret ofMOC-MAB are sublinear in the number of rounds. We also compare the performanceof the proposed algorithm with other state-of-the-art methods in synthetic andreal-world datasets. The proposed model and the algorithm have a wide range ofreal-world applications that involve multiple and possibly conflictingobjectives ranging from wireless communication to medical diagnosis andrecommender systems.

机译：在本文中，我们提出了一个新的具有两个目标的多目标情境多臂强盗问题，其中一个目标主导另一个目标。与学习者为其选择的每个手臂获得随机标量奖励的单目标强盗问题不同，在提出的问题中，学习者获取随机奖励矢量，其中奖励矢量的每个分量对应于目标之一，奖励的分布取决于在每一轮开始时提供给学习者的上下文中。我们将此问题称为具有优势目标的上下文多臂强盗（CMAB-DO）。在CMAB-DO中，学习者的目标是在非主导目标中最大化其总报酬，同时确保在主导目标中最大化其总报酬。在这种情况下，给定上下文的最优臂是使非主导目标中的预期奖励最大化的所有臂，其中在主导目标中的预期奖励最大化的臂。首先，我们证明最优臂位于帕累托前沿。然后，我们提出了多目标上下文多臂强盗算法（MOC-MAB），并定义了两种性能指标：二维（2D）后悔和帕累托后悔。我们显示，MOC-MAB的2D后悔和帕累托后悔在回合数上都是次线性的。我们还将合成和真实数据集中所提算法的性能与其他最新方法进行了比较。所提出的模型和算法具有广泛的实际应用，涉及从无线通信到医学诊断和推荐系统的多个甚至可能相互冲突的目标。

著录项

作者
Tekin, Cem; Turgay, Eralp;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives [J] . Huyuk Alihan, Tekin Cem Machine Learning . 2021,第6期

机译：具有词典排序和满足目标的多目标多武装匪
2. Hyper-heuristics using multi-armed bandit models for multi-objective optimization [J] . Almeida Carolina P., Goncalves Richard A., Venske Sandra, Applied Soft Computing . 2020,第1期

机译：利用多武装强盗模型进行多目标优化的超高兴
3. Multi-objective Contextual Bandit Problem with Similarity Information [J] . Eralp Turgay, Doruk Oner, Cem Tekin JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：具有相似性信息的多目标上下文强盗问题
4. Multi-Objective contextual bandits with a dominant objective [C] . Cem Tekin, Eralp Turgay 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing . 2017

机译：具有主要目标的多目标语境强盗
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. Multi-Objective contextual bandits with a dominant objective [O] . Cem Tekin, Eralp Turgay 2017

机译：具有主导目标的多目标情境匪徒

Multi-objective Contextual Multi-armed Bandit Problem with a Dominant Objective

摘要

著录项

相似文献

相关主题

期刊订阅